-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add downcast to pd.to_numeric #13425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 84.31%@@ master #13425 diff @@
==========================================
Files 138 138
Lines 51157 51176 +19
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43131 43151 +20
+ Misses 8026 8025 -1
Partials 0 0
|
@@ -2097,3 +2097,35 @@ def _random_state(state=None): | |||
else: | |||
raise ValueError("random_state must be an integer, a numpy " | |||
"RandomState, or None") | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in pandas.types.missing.py
(new file); let's not add anymore to core/common.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Done.
06612d6
to
052f09e
Compare
@jreback : Refactored |
f775312
to
0a503af
Compare
lgtm. @jorisvandenbossche small issue is whethere the kwargs are the right names; further maybe:
|
@jreback : I see what you're saying, but the decoupling would raise the issue of which one takes precedence? i.e. what happens if for some reason or another, someone passes in |
0a503af
to
943eaeb
Compare
@gfyoung that would be an error, both options cannot be True. |
@jreback : hmmm, on the fence regarding this - will wait to see what @jorisvandenbossche says |
943eaeb
to
1a720ca
Compare
# compact int64 "NaN" value | ||
int8_na = na_values[np.int8] | ||
int64_na = na_values[np.int64] | ||
s = [int64_na, 2, 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the "NaN" like values is the current lib.downcast_int64
behaviour, but I am not sure we should put that in user-facing code?
Because that means you are actually changing the content (and they are currently not used as NaNs)
At the moment, the implementation in this PR is not really a substitute for the Of course it could be easily changed that the keyword is also used when the dtype is already numeric (the |
@jorisvandenbossche I view alternatively, maybe add these instead to and @gfyoung certainly remove the non int64-sentinels though. |
9a01afa
to
13b518d
Compare
@jorisvandenbossche : that is not correct - downcasting occurs so long as the converted data is of integer dtype, irregardless of its starting dtype i.e. data that is already integral can also be downcasted. This is true in While I do see what you are saying that this change could be viewed as expanding the functionality beyond what it was originally intended for, |
@gfyoung Looks good! Made a couple of doc comments Last comment: I would personally just choose one of |
9103251
to
bc188c7
Compare
Hmmm...seems reasonable enough @jreback : should we choose one of |
bc188c7
to
fdc176d
Compare
@@ -1788,6 +1788,46 @@ but occasionally has non-dates intermixed and you want to represent as missing. | |||
In addition, :meth:`~DataFrame.convert_objects` will attempt the *soft* conversion of any *object* dtypes, meaning that if all | |||
the objects in a Series are of the same type, the Series will have that dtype. | |||
|
|||
:meth:`~pandas.to_numeric` is very similar to `pd.to_datetime`, except that it is for conversion to numerical data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make pd.to_datetime
a function reference as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
7215b74
to
6538f3d
Compare
1) :meth:`~pandas.to_datetime` (conversion to datetime objects) | ||
|
||
.. ipython:: python | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use lists here - no need to use a numpy array
also say these operate on 1dim things (or scalars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Done.
6538f3d
to
4758dcc
Compare
@jreback : Travis is passing, so ready to merge if there are no other concerns. |
thanks @gfyoung another nice PR! though we DO seem to be having long discussions :) |
give a look on the docs once built. I edited slightly. |
@jreback: well, I think the long discussion was quite beneficial. We caught a bug in the Travis build script and made the API slightly more compact (one arg instead of two). Will take a look at the doc. |
Title is self-explanatory. Closes pandas-dev#13352. Author: gfyoung <[email protected]> Closes pandas-dev#13425 from gfyoung/to-numeric-enhance and squashes the following commits: 4758dcc [gfyoung] ENH: add 'downcast' to pd.to_numeric
Title is self-explanatory. Closes #13352.